Toxicology Behavioural Analysis Report – velocity_large

Background

Pools of Zebrafish larvae were exposed to treatment chemicals for 5 days in glass petri-dishes. The DMSO pool of larvae are a shared control group between all of the different treatments.
Figure 1. The experimental layout of the 5-day chemical exposures in petri dishes prior to the behavioural assay

Figure 1. The experimental layout of the 5-day chemical exposures in petri dishes prior to the behavioural assay

On the final exposure day, 12 larvae per treatment group are pipetted into 96-Well plates. There is one experiments included in the data set, with 3 unique test chemicals and a DMSO control. There are 12 fish per dose group and 2 dose groups – 24 fish per chemical (except DMSO which is 12) – 84 biological samples per plate
Figure 2. The behavioural assay plate layout for an individual experiment. In total, 54 5-day old embryos, 9 for each dose, are transferred to a 96-well plate in exposure media

Figure 2. The behavioural assay plate layout for an individual experiment. In total, 54 5-day old embryos, 9 for each dose, are transferred to a 96-well plate in exposure media

The behavioural data is collected using an infrared camera over a 50-minute period where the first 20 minutes allow the zebrafish embryos to acclimate to their environment, and for the next 30 minutes there are 5-minute cycles of light and darkness – 96-well plate format. Zebrafish naturally tend to be more active in the dark.
Figure 3. A visual representation of the behavioural assay protocol

Figure 3. A visual representation of the behavioural assay protocol

The infrared camera traces the swim paths of fish during the entire experiment. A one-minute snapshot of raw swim paths look like this:
Figure 4. An example of what the raw swim-path tracing looks like from the Viewpoint Zebrabox infrared camera and Viewpoint Zebralab software

Figure 4. An example of what the raw swim-path tracing looks like from the Viewpoint Zebrabox infrared camera and Viewpoint Zebralab software

The raw data contain many variables that we will explore once we import the data. You can browse the meta data in the next section of this report.

Directory & Meta Data

.XLS files have been converted to .csv files and are included in the directory /home/joryc/Downloads/GNU Zip Files/Collaborator_Data/Data. These are the raw data files that this EDA will be using.

Table 1. Information about each chemical included in the experiment with dose information in some units
Animal Chemical Dose
Animal01 DMSO 0
Animal02 DMSO 0
Animal03 DMSO 0
Animal04 DMSO 0
Animal05 DMSO 0
Animal06 DMSO 0
Animal07 DMSO 0
Animal08 DMSO 0
Animal09 DMSO 0
Animal10 DMSO 0
Animal11 DMSO 0
Animal12 DMSO 0
Animal13 PFOS 0.1
Animal14 PFOS 0.1
Animal15 PFOS 0.1
Animal16 PFOS 0.1
Animal17 PFOS 0.1
Animal18 PFOS 0.1
Animal19 PFOS 0.1
Animal20 PFOS 0.1
Animal21 PFOS 0.1
Animal22 PFOS 0.1
Animal23 PFOS 0.1
Animal24 PFOS 0.1
Animal25 PFOS 1
Animal26 PFOS 1
Animal27 PFOS 1
Animal28 PFOS 1
Animal29 PFOS 1
Animal30 PFOS 1
Animal31 PFOS 1
Animal32 PFOS 1
Animal33 PFOS 1
Animal34 PFOS 1
Animal35 PFOS 1
Animal36 PFOS 1
Animal37 OBS 0.1
Animal38 OBS 0.1
Animal39 OBS 0.1
Animal40 OBS 0.1
Animal41 OBS 0.1
Animal42 OBS 0.1
Animal43 OBS 0.1
Animal44 OBS 0.1
Animal45 OBS 0.1
Animal46 OBS 0.1
Animal47 OBS 0.1
Animal48 OBS 0.1
Animal49 OBS 1
Animal50 OBS 1
Animal51 OBS 1
Animal52 OBS 1
Animal53 OBS 1
Animal54 OBS 1
Animal55 OBS 1
Animal56 OBS 1
Animal57 OBS 1
Animal58 OBS 1
Animal59 OBS 1
Animal60 OBS 1
Animal61 F53B 0.1
Animal62 F53B 0.1
Animal63 F53B 0.1
Animal64 F53B 0.1
Animal65 F53B 0.1
Animal66 F53B 0.1
Animal67 F53B 0.1
Animal68 F53B 0.1
Animal69 F53B 0.1
Animal70 F53B 0.1
Animal71 F53B 0.1
Animal72 F53B 0.1
Animal73 F53B 1
Animal74 F53B 1
Animal75 F53B 1
Animal76 F53B 1
Animal77 F53B 1
Animal78 F53B 1
Animal79 F53B 1
Animal80 F53B 1
Animal81 F53B 1
Animal82 F53B 1
Animal83 F53B 1
Animal84 F53B 1
Animal85 NA NA
Animal86 NA NA
Animal87 NA NA
Animal88 NA NA
Animal89 NA NA
Animal90 NA NA
Animal91 NA NA
Animal92 NA NA
Animal93 NA NA
Animal94 NA NA
Animal95 NA NA
Animal96 NA NA

Importing the raw data files & taking a glimpse

Glimpse the raw data to see the structure of each variable, the number of observations and the class of the raw_data object

## Rows: 9,790
## Columns: 16
## $ animal    <chr> "Animal01", "Animal01", "Animal02", "Animal02", "Animal03", …
## $ Treatment <chr> "DMSO", "DMSO", "DMSO", "DMSO", "DMSO", "DMSO", "DMSO", "DMS…
## $ an        <lgl> FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, T…
## $ start     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ end       <dbl> 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, …
## $ inact     <int> 54, 54, 65, 65, 50, 50, 60, 60, 103, 103, 82, 82, 14, 14, 69…
## $ inadur    <dbl> 18.5, 18.5, 7.7, 7.7, 7.5, 7.5, 10.2, 10.2, 24.1, 24.1, 16.2…
## $ inadist   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ smlct     <int> 175, 175, 291, 291, 251, 251, 246, 246, 196, 196, 104, 104, …
## $ smldur    <dbl> 29.5, 29.5, 25.9, 25.9, 18.2, 18.2, 26.5, 26.5, 29.6, 29.6, …
## $ smldist   <dbl> 606.0, 606.0, 698.8, 698.8, 423.4, 423.4, 575.4, 575.4, 472.…
## $ larct     <int> 139, 139, 288, 288, 262, 262, 237, 237, 160, 160, 25, 25, 38…
## $ lardur    <dbl> 6.7, 6.7, 19.8, 19.8, 16.4, 16.4, 17.5, 17.5, 6.3, 6.3, 1.2,…
## $ lardist   <dbl> 365.0, 365.0, 1139.7, 1139.7, 1119.2, 1119.2, 889.3, 889.3, …
## $ emptyct   <int> 58, 0, 118, 0, 195, 0, 61, 0, 1, 0, 1, 0, 15, 0, 2, 0, 301, …
## $ emptydur  <dbl> 5.2, 0.0, 6.6, 0.0, 17.9, 0.0, 5.8, 0.0, 0.0, 0.0, 0.1, 0.0,…
## [1] "spec_tbl_df" "tbl_df"      "tbl"         "data.frame"

From the glimpse it can be seen that there are 16 variables in the tibble/data frame:

  • animal represents individual animals in the experiment
  • Treatment The chemical and dose group
  • an Unknown/not useful
  • start start time of observation in seconds
  • end end time of observation in seconds
  • inact Inactivity Counts | the number of times the fish went from being active to inactive over the observation time
  • inadur Inactivity Duration | the duration of time, in seconds, the fish went from being active to inactive over the observation time (1 minute)
  • inadist Inactivity Distance | the distance travelled by inactive observations (this value should be 0)
  • smlct Small Activity Counts | the number of times the fish had a small burst of swim activity over the observation time (1 minute)
  • smldur Small Activity Duration | the duration of the small burst of swim activity over the observation time (1 minute)
  • smldist Small Activity Distance | The distance travelled during small bursts of activity
  • larct Large Activity Counts | the number of times the fish had a large burst of swim activity over the observation time (1 minute)
  • lardur Large Activity Duration | the duration of the large burst of swim activity over the observation time (1 minute)
  • lardist Large Activity Distance | The distance travelled during large bursts of activity
  • emptyct Counts that were neither inactive or active (data recording artifact)
  • emptydur duration of time fish was neither inactive, or active (data recording artifact) | Almost acts like a confidence value. The closer it is to 60, the more unreliable the data are

Investigating raw_data

Checking expectations

Number of observations/rows

It is expected that the raw data will have 4800 rows because there are 96 wells, and the assay is 50 minutes

nrow(raw_data)
## [1] 9790
However, there are 9790 rows present in the raw data. This expectation was violated because each observation is duplicated and there are some extra observations

By looking at just the head of raw_data, it can be seen that the variable an has a TRUE and FALSE row for each individual observation. The only difference between these duplicate rows is that the FALSE rows retain information about emptyct and emptydur.

raw_data <- raw_data %>%
  filter(an == FALSE) %>% # Removing duplicate rows
  select(-c(an)) # This variable is not very useful anymore, so removing
nrow(raw_data)
## [1] 4895

After filtering for just the false values, there are now 4895 rows.  There are -134305 extra observations in raw_data because there are some extra observations past 50 minutes. 

raw_data <- raw_data %>%
  filter(end <= 3000) # Deleting observations past 50 mins (3000 seconds)
nrow(raw_data)
## [1] 4800
identical(as.numeric(nrow(raw_data)), (96 * 50 * 1)) # is the expected number of rows consistent with the observed number of rows after processing?
## [1] TRUE

After ensuring there are no observations past the 50-minute mark, there are now 4800 rows, as expected.

Treatment-level NAs

It is expected that wells 85:96 will all be NA in the Treatment column because these are all empty wells (Figure 2). This means that all NA treatments should be 600 observations long. After removing NAs there should be 4200 rows in raw_data.

raw_data <- raw_data %>%
  filter(Treatment != is.na(Treatment))
nrow(raw_data)
## [1] 4200
identical(as.numeric(nrow(raw_data)), (84 * 50 * 1)) # Does the expected number of rows match the observed number of rows after filtering?
## [1] TRUE

Variable Distributions & Outliers

Quick visualizations of the distributions of each variable are a fast and easy way to learn a lot about the data such as the range, distribution of observations, and outliers.

First, the ‘counts’ variable distributions will be investigated.
Figure 5. Quick plots of the 'counts' variables in the `raw_data` object

Figure 5. Quick plots of the ‘counts’ variables in the raw_data object

Figure 5 (plot 1) shows that there are some artifacts in the data. Sometimes, the camera/software was not able to detect a fish even though, there certainly were fish in those wells. The emptyct variable (plot 1) is a good tool to use for flagging observations that need to transformed to NAs in the next section (Suspicious Values).
It can be seen that the amount of times an animal goes inactive during an observation period (inact) is approximately ~60 times per minute. As well, it can be seen that small swim bursts (smlct) tend to occur just over 225 times per minute. And finally, we can see that large swim bursts (larct) can either occur just under 200 times per minute, or 0 times per minute. This could be due to sensitive effects of light on swim inhibition, or darkness stimulating large swim behaviours.

The distributions of all count variables are slightly skewed, with few outliers, except for the empty well counts which are very positively skewed.

Next, the ‘duration’ variable distributions will be explored. Note: this variable should never exceed 60 seconds.
Figure 6. Quick plots of the 'duration' variables in the `raw_data` object

Figure 6. Quick plots of the ‘duration’ variables in the raw_data object

Figure 6 reveals another red flag with the emptydur variable (plot 1 in figure 6). There are some observations that show 60 full seconds of being empty! This is likely more than just an artifact in the recording instrument/software. These are likely dead or immobile fish that never moved at all so the infrared camera was never able to start tracing their swim patterns (during that 60-second observation period). However, it can also be seen that there are some observations greater than 0 and less than 60 in this plot. In theory, if an animal is present in the well, the emptydur value should always be zero. An arbitrary threshold of 20 seconds of empty duration will be used to transform all observations (across variables) with an emptydur > 20 into NAs

Again, it can also be seen that the distributions of observations for each variable is slightly skewed. Note also that inadur and activedur (plots 2 and 3 from figure 6) are approximately inversely related as expected. The emerging pattern of large swim activity duration (lardur) clustering around two modes (0s and ~15s) can be observed, similar to the counts variable.

Behavioural perturbation (hyperactivity) suggests that it is possible that these chemicals can effect the nervous system of individuals early on in development. However, it is wise to be cautious and not draw any conclusions yet without any sort of statistical analyses.

Next, the distributions of the ‘distance’ variables will be investigated.
Figure 7. Quick plots of the 'distance' variable in the `raw_data` object

Figure 7. Quick plots of the ‘distance’ variable in the raw_data object

Figure 7 shows that the distributions of of the ‘distance’ variables are slightly skewed with the totaldist variable being the most normally distributed. Notably, totaldist seems to be a promising effect endpoint to analyse since it is the most normally distributed of all the other effect endpoints.

Evaluating Data Quality

Figure 6 showed that the emptydur variable can be used reliably to filter out observations with poor data quality. The ‘empty duration’ variable ranges from 0s to 60s and indicates how long the observation was not able to detect a fish in the well. An arbitrary cutoff value of 20 seconds will be used to determine if an observation was of poor-quality and therefore, should be converted to NA. By doing this, the confidence in the accuracy of observations can be increased across the entire data set.

NAObservations <- raw_data %>%
  filter(poorQual == TRUE) %>% # Filter only rows with poor quality that have behavioural endpoint observations turned to NAs
  nrow()
NAObservations
## [1] 466

466 60-second observations (rows) were transformed to NAs across all of the behavioural endpoint observation variables.

Overall, the animal recording set-up had an approximate failure-rate of 11 % – the percentage of time the infrared camera failed to detect an animal when it was present in a well.

Exploring & Visualizing

ggplots

Bar Graphs

Plots Binned by individual 5-minute cycles
PFOS Mean & SE

OBS Mean & SE

F53B Mean & SE

PFOS Median & SE

OBS Median & SE

F53B Median & SE

PFOS Mean & SD

OBS Mean & SD

F53B Mean & SD

PFOS Median & SD

OBS Median & SD

F53B Median & SD

AOV Results Binned 5-mins

Plots Binned by cycle type (light or dark - 2 bins - 15mins each)
PFOS Mean & SE

OBS Mean & SE

F53B Mean & SE

PFOS Median & SE

OBS Median & SE

F53B Median & SE

PFOS Mean & SD

OBS Mean & SD

F53B Mean & SD

PFOS Median & SD

OBS Median & SD

F53B Median & SD

AOV Resutls Light-Dark Bins

Plots Not Binned by cycle
PFOS Mean & SE

OBS Mean & SE

F53B Mean & SE

PFOS Mean & SD

OBS Mean & SD

F53B Mean & SD

PFOS Median & SE

OBS Median & SE

F53B Median & SE

PFOS Median & SD

OBS Median & SD

F53B Median & SD

AOV Results



Dunnett_Results_Table

Trace Paths (line plots)

Mean w/ SE

Mean w/ SD

Median w/ SE

Median w/ SD